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ABSTRACT 

Social  networks  often  encode  community  structure  using  mul¬ 
tiple  distinct  types  of  links  between  nodes.  In  this  paper  we 
introduce  a  novel  method  to  extract  information  from  such 
multi-layer  networks,  where  each  type  of  link  forms  its  own 
layer.  Using  the  concept  of  Pareto  optimality,  community  de¬ 
tection  in  this  multi-layer  setting  is  formulated  as  a  multiple 
criterion  optimization  problem.  We  propose  an  algorithm  for 
finding  an  approximate  Pareto  frontier  containing  a  family  of 
solutions.  The  power  of  this  approach  is  demonstrated  on  a 
Twitter  dataset,  where  the  nodes  are  hashtags  and  the  layers 
correspond  to  (1)  behavioral  edges  connecting  pairs  of  hash- 
tags  whose  temporal  profiles  are  similar  and  (2)  relational 
edges  connecting  pairs  of  hashtags  that  appear  in  the  same 
tweets. 

Index  Terms —  Community  detection,  multi-layer  net¬ 
works,  Twitter 

1.  INTRODUCTION 

Social  networks  have  become  rich  sources  of  data  for  network 
analysis,  where  objectives  might  include  community  detec¬ 
tion,  edge  prediction,  node  behavior  prediction,  and  model 
inference.  However,  it  has  become  increasingly  difficult  to 
extract  meaningful  information  from  these  networks  due  to 
the  explosion  in  both  the  volume  of  data  collected  and  the 
diversity  of  available  data  types.  In  this  paper  we  focus  on  ad¬ 
dressing  the  latter  problem  for  the  task  of  community  detec¬ 
tion;  specifically,  we  consider  networks  containing  multiple 
layers  of  interactions  between  nodes. 

For  many  social  network  applications,  measures  of  associ¬ 
ation  between  pairs  of  nodes  may  be  available  along  multiple 
dimensions.  For  example,  graph  edges  may  be  observed  di¬ 
rectly  in  the  data,  or  they  may  be  inferred  from  actions  of  the 
agents  in  the  network.  We  make  the  distinction  between  rela¬ 
tional  links  that  are  observed  explicitly  and  behavioral  links 
that  are  inferred  from  ancillary  data  describing  node  behav¬ 
ior.  Examples  of  relational  links  between  users  might  include 
observed  interactions  over  a  period  of  time,  mutually  estab¬ 
lished  friendship  connections,  or  email  sender-reciever  rela- 
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tionships.  Likewise,  behavioral  links  might  be  drawn  between 
users  who  post  items  with  similar  semantic  content,  like  the 
same  bands  or  movies,  or  exhibit  correlated  activity  over  time. 
Further,  it  is  possible  to  have  multiple  types  of  relational  and 
behavioral  links;  for  instance,  there  could  be  both  a  profes¬ 
sional  and  personal  social  network  over  the  same  set  of  users. 
Networks  with  multiple  distinct  edge  types  have  been  called 
multi-layer  m,  multi-level  0,  multi-relational,  or  multiplex 
El  networks. 

In  a  multi-layer  network,  each  layer  may  have  a  unique 
topology.  The  simplest  way  to  apply  existing  network  analy¬ 
sis  algorithms  (which  generally  assume  homogeneous  edges) 
is  to  “flatten”  the  data,  i.e.,  to  combine  all  the  different  types 
of  links  into  a  single-layer  network.  This  can  be  accomplished 
in  various  ways,  for  instance,  by  performing  a  logical  AND  or 
OR  on  the  layer-specific  adjacency  matrices,  or  by  computing 
their  weighted  (and  possibly  thresholded)  average.  However, 
this  approach  has  many  hidden  pitfalls;  for  example,  if  one 
of  the  layers  is  noisier  than  the  others  then  it  probably  should 
not  receive  equal  consideration  when  attempting  community 
detection. 

A  better  strategy,  we  argue,  is  to  directly  analyze  the  multi¬ 
layer  networks  without  flattening.  To  show  how  this  can  be 
done,  we  propose  a  new  method  of  community  detection  for 
multi-layer  networks.  Our  approach  employs  multi-objective 
optimization,  taking  into  account  multiple  layers  of  network 
structure,  which  is  then  used  to  find  a  community  partition. 
We  show  that  this  algorithm  can  provide  significantly  better 
community  detection  than  that  obtained  by  standard  single¬ 
layer  techniques. 

The  paper  proceeds  as  follows.  In  Sec.  |2]we  define  multi¬ 
layer  networks.  In  Sec.  [3]  a  Pareto  optimality  approach  to 
multi-layer  community  detection  is  proposed,  and  in  Sec.  0] 
we  apply  the  proposed  approach  to  Twitter  data.  Finally,  we 
discuss  related  work  in  Sec.  |5]and  give  concluding  remarks 
in  Sec.|6] 

2.  MULTI-LAYER  NETWORKS 

A  multi-layer  network  G  =  (V,f)  consists  of  vertices 
V  =  {wi, . . .  ,Vp},  common  to  all  layers,  and  edges  £  = 
(fi, . . . ,  Em)  in  Af  layers,  where  £k  is  the  edge  set  for  layer 
k,  and  £k  =  Vi,Vj  €  V}.  Each  edge  is  undirected, 

though  extensions  to  the  directed  case  are  not  difficult.  The 


multi-layer  degree  of  a  node  i  is  d*  €  with  each  entry 
being  the  degree  of  node  i  on  layer  k. 

The  adjacency  matrix  and  degree  matrix  are  dehned  as 
usual  for  each  layer: 

[[A%j  =  ej.„.  =  dmg{[d%,  [(1%, [dP]k)  (1) 

Note  that  is  simply  apxp  diagonal  matrix  with  the  layer- 
specific  node  degrees  on  the  diagonal. 

3.  COMMUNITY  DETECTION  VIA 
MULTIOBJECTIVE  OPTIMIZATION 

Many  existing  community  detection  algorithms  involve  opti¬ 
mization  a.  Methods  that  fall  into  this  category  include  spec¬ 
tral  algorithms,  modularity  methods,  and  methods  that  rely  on 
statistical  inference,  particularly  those  that  try  to  maximize  a 
likelihood  function.  It  seems  natural  that  a  multi-layer  gen¬ 
eralization  of  such  algorithms  might  somehow  combine  the 
optimization  objective  functions  as  applied  to  each  individual 
layer;  this  is  the  basis  of  multi-objective  optimization. 

More  formally,  let  community  structure  in  a  network  be 
described  by  a  node  partition  C,  where  C{i)  =  k  means  that 
node  i  is  in  part  k.  Single-objective  optimization  methods  of 
community  detection  seek  to  hnd  the  partition  argminf;/(C') 
that  minimizes  an  objective  function  /  (which  depends  inter¬ 
nally  on  the  network  structure).  In  the  following  we  consider 
the  two  community  case;  more  communities  can  be  found  by 
a  recursive  use  of  the  algorithm. 

Now  consider  a  two-layer  network,  and  let  fi  and  /2  be 
objective  functions  for  the  two  layers.  One  obvious  way  of 
combining  the  layers  would  be  to  minimize  the  linear  combi¬ 
nation  afi{C)  -I-  (1  —  a)/2(C')  over  C,  where  a  G  [0, 1]. 
However,  linear  combination  may  be  restrictive,  especially 
when  the  objective  functions  are  complex.  A  more  general 
approach  is  instead  to  seek  the  Pareto  optimal  solutions  of  the 
multi-objective  minimization  problem: 

C  =  argminc-  [/i  (C) ,  /2  (C)]  .  (2) 

A  solution  to  the  multi-objective  optimization  problem  (|2|l  is 
said  to  be  weakly  Pareto  optimal  (or  weakly  non-dominated) 
if  it  is  not  possible  to  decrease  any  objective  function  with¬ 
out  increasing  some  other  objective  function  W-  More  for¬ 
mally,  a  solution  Ci  dominates  a  solution  C2  if  /i(C'r)  < 
/i(C'2)  for  every  objective  function  fi  and  there  exists  some  j 
such  that  fj(Ci)  <  /j  ((^2).  The  hrst  Pareto  front  is  the  set  of 
weakly  non-dominated  points. 

Calculating  an  exact  Pareto  front  is,  in  general,  a  challeng¬ 
ing  task.  The  most  popular  approximate  methods  are  genetic 
algorithms,  which  employ  biologically  inspired  heuristics  to 
attempt  to  transform  randomly  selected  seed  cases  into  solu¬ 
tions  on  the  Pareto  front  using  propagation.  More  details  can 
be  found  in  Q  [8)  and  the  references  therein.  One  disadvan¬ 
tage  to  genetic  approaches  is  that  they  are  not  deterministic. 


Input:  /i,  /2 

Obtain  optimum  solutions  ,  C2  for  each  layer 

Initialize  C  =  Cl 

repeat 

fori  :C'(i)  ^  C^i)  do 

Qnew  ^  ^  dfi) 

C0St(i)^/2(C"^“)-/2(C) 

end  for 

i*  ■(—  argminj  cost(i) 

Cii*)  G-  C*M*) 

until  C  =  Cl 

Output:  non-dominated  solution  values  taken  by  C 

Fig.  I.  Proposed  algorithm  for  Pareto  front  identihcation. 

Additionally,  there  is  no  guarantee  that  any  of  the  Pareto  front 
will  be  correctly  identihed.  Finally,  most  genetic  algorithms 
deal  with  real-valued  decision  variables,  while  the  community 
detection  problem  has  a  discrete  decision  space. 

The  alternative  strategy  employed  in  this  paper  is  based 
on  the  Kernighan-Lin  node  swapping  technique  ||9l-  The  ob¬ 
jective  is  to  hnd  solutions  that  are  approximately  Pareto  opti¬ 
mal.  If  it  is  possible  to  obtain  a  sample  of  solutions  that  are 
likely  to  be  on  or  near  the  front,  these  points  can  be  sorted  for 
non-domination  very  quickly  Q.  In  this  way,  a  large  set  of  so¬ 
lutions  is  hltered  to  hnd  candidates  that  are  potentially  Pareto 
optimal  and  worth  further  consideration.  Figure  [T]  shows  the 
proposed  algorithm. 

For  community  detection,  the  objective  is  to  minimize  the 
ratio-cut  fk  for  each  layer  k  =  1,2: 

cut(C)  =  ^  [A%,  (4) 

C{i)  =  l,CU)=2 

A  relaxed  version  of  this  objective  function  can  be  solved  by 
performing  an  eigendecomposition  on  the  Laplacian  Li  = 
Di  —  Ai.  More  details  can  be  found  in  ITOl. 

4.  TWITTER  DATASET 

The  proposed  algorithm  was  applied  to  a  month  of  data  from 
Twitter.  A  two-layer  network  on  hashtags  was  developed  us¬ 
ing  tweets  from  October  2012.  The  data  was  obtained  from 
the  Twitter  stream  API  at  gardenhose  level  access,  which  cor¬ 
responds  to  10%  of  all  tweets  over  the  month.  A  list  of  hash- 
tags  and  the  users  who  tweeted  them  was  created  for  each  day, 
as  well  as  the  volume  (i.e.,  number  of  observed  occurrences) 
of  each  hashtag  per  day. 

Hashtags  that  were  directly  connected  with  the  presiden¬ 
tial  election  or  politics  were  chosen  out  of  a  list  of  the  most 
popular  hashtags  for  the  month,  which  yielded  48  hashtags. 


Hastag  Volume  for  October  2012 
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(a)  Hashtag  Volume  Layer  (b)  Hashtag  User  Layer 

Fig.  2.  A  network  visualization  of  two  layers  of  the  hashtag 
dataset  for  October  10th,  2012.  This  example  shows  the  dif¬ 
fering  topologies  generated  by  different  links  in  a  network. 
While  we  see  some  similarities — for  instance,  nodes  38,  39, 
and  32  have  high  degree  centralities  in  both  networks — these 
networks  have  many  differences,  the  most  obvious  being  that 
the  volume  layer  is  not  even  fully  connected,  while  the  user 
layer  is  fully  connected  and  has  a  diameter  of  only  6. 

Figure  |2]  shows  an  example  of  two  network  layers  for  one  day 
on  the  original  set  of  48  hashtags.  In  order  to  include  some 
higher  order  connections,  the  list  was  expanded  by  including 
hashtags  whose  volume  per  day  behaved  similarly  over  the 
month  as  the  hrst  48;  this  grew  the  network  to  515  tags. 

Initially,  the  total  volume  of  the  hashtags  was  studied  over 
time,  and  real  events  were  compared  with  the  profile;  this  is 
shown  in  Figure  |3]  Some  events  are  correlated  with  volume; 
Hurricane  Sandy  falls  on  the  two  day  period  with  the  largest 
hashtag  volume.  The  second  presidential  debate  also  corre¬ 
sponds  to  a  spike  in  hashtag  volume.  In  contrast,  the  first 
presidential  debate  is  not  an  identihable  event  in  the  volume 
plot. 

A  time  series  of  two-layer  networks  was  created  with 
hashtags  as  the  nodes.  Specifically,  31  two-layer  networks 
were  created  by  aggregating  daily  Tweet  data  over  each  day 
in  the  month.  The  hrst  layer  linked  two  hashtags  if  any  user 
used  both  the  hashtags  in  that  particular  day.  This  layer  is 
referred  to  as  the  hashtag  user  layer.  The  second  layer  linked 
two  hashtags  if  they  had  similar  volume  prohles  over  time. 
Intuitively,  two  hashtags  would  have  a  link  with  each  other 
if  they  were  popular  or  unpopular  at  the  same  time.  So  as 
not  to  take  into  account  too  much  past  data,  the  volume  cor¬ 
relation  was  calculated  using  a  moving  window  of  5  days.  A 
Pearson  correlation  coefficient  was  used  to  calculate  the  cor¬ 
relations  in  volume  for  each  pair  of  hashtags;  the  correlations 
then  underwent  a  Fisher  transformation  and  were  thresholded 
by  a  value  of  1.3859  which  corresponds  to  an  approximate 
5%  false  positive  rate  (in  the  bivariate  normal  case)  when 
testing  for  the  presence  of  a  positive  correlation  ifTTl.  This 
layer  is  referred  to  as  the  hashtag  volume  layer.  Figure  |4] 
demonstrates  pictorially  the  creation  of  the  two  layers,  using 
a  simple  dataset  of  three  hashtags. 

We  will  show  that  one  is  able  to  obtain  more  informa- 
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Fig.  3.  Volume  of  observed  usage  of  the  515  political  hash- 
tags  along  with  an  event  timeline  for  October  2012.  Notice 
that  while  we  can  see  that  some  events  correlate  with  hashtag 
usage  for  our  dataset,  this  is  not  true  for  all  events  that  might 
be  expected  to  affect  political  hashtags. 

tion  by  the  proposed  Pareto  multi-layer  analysis  methods  than 
when  the  two  layers  are  analyzed  separately.  To  this  end,  the 
graph-cut  partitions  (HI  were  computed  for  each  day.  We  also 
computed  approximately  Pareto-optimal  partitions  by  com¬ 
bining  the  single-layer  solutions  using  Algorithm  [H  and  se¬ 
lected  a  single  partition  by  using  the  approximate  midpoint  of 
the  Pareto  front.  The  Adjusted  Rand  Index  (ARI)  ifTSll  was 
then  used  to  compare  partitions  on  different  days  and  see  how 
hashtag  relationships  change  over  time.  The  ARI  measures 
how  similar  partitions  are,  and  can  vary  between  -1  and  1. 

Figure  |5]  shows  heat  maps  of  all  the  ARI  indexes,  both 
for  the  single  layers  considered  separately  as  well  as  for  the 
proposed  algorithm.  The  hashtag  user  layer  reflects  fairly  sta¬ 
ble  correlation  among  the  two  clusters  until  day  16,  where 
there  is  a  phase  transition.  Note  that  this  phase  transition  also 
occurs  on  the  volume  layer  heatmap.  There  is  not  much  sim¬ 
ilarity  between  days  in  the  user  network,  implying  that  there 
is  not  an  optimal  stable  two  cluster  solution  when  considering 
the  hashtag  user  layer  alone,  and  it  is  difficult  to  extract  real 
events. 

In  the  hashtag  volume  layer  heatmap,  some  community 
structure  over  days  are  highly  correlated  with  each  other.  In 
particular,  the  days  on  which  Hurricane  Sandy  occurs  have 
communities  that  are  highly  correlated.  It  is  also  interesting 
to  note  that  the  communities  at  the  end  of  the  month  are  noth¬ 
ing  like  the  bisected  communities  at  the  beginning,  which  im¬ 
plies  considerable  temporal  evolution  in  the  network.  There 
is  also  more  sparsity  in  the  hashtag  volume  layer  heatmap; 
consequently  it  may  be  possible  to  detect  events  more  easily 
using  this  network. 

The  evident  block  structure  in  the  Pareto  combined 


Fig.  5.  The  more  highly  resolved  block  structure  in  combined  network  heatmap  clearly  indicates  that  the  hashtag  community 
structure  remains  quite  stable  and  coherent  over  the  first  15  days  of  October  but  then  breaks  up  into  smaller  clusters  of  coherency 
over  the  remainder  of  the  month.  This  may  reflect  the  change  of  public  opinions  after  the  second  Presidential  debates  (October 
16)  and  the  effect  of  Hurricane  Sandy  (October  28)  on  Twitter  hashtag  volume  and  usage. 
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single  layer  solutions.  In  particular,  days  3-5  are  more  highly 
correlated  in  the  combined  solution;  October  3rd  was  the  day 
of  the  first  debate.  Interestingly,  the  layers  jointly  reveal  cor¬ 
relations  between  days  not  visible  in  the  independent  single 
layer  analyses. 
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Fig.  4.  The  two  layers  of  the  Twitter  hashtag  network  are  illus¬ 
trated.  At  the  top  is  the  relational  layer  where  a  link  between 
two  hashtags  indicates  that  at  least  one  user  used  both  hash- 
tags  in  the  same  Tweet.  At  the  bottom  is  the  behavioral  layer 
where  a  link  indicates  similarity  in  the  hashtag  usage  volume 
over  time. 


heatmap  shows  that  the  multi-layer  algorithm  eliminates 
similarities  between  the  first  and  second  half  of  the  months. 
The  Pareto  combined  solution  holds  attributes  from  both  the 
hashtag  volume  layer  and  hashtag  user  layer;  the  structural 
patterns  that  were  present  in  the  latter  half  of  the  month  of 
the  hashtag  volume  network  are  also  present  in  the  combined 
solution.  The  first  half  of  the  month  also  has  some  self¬ 
similarity,  which  is  seen  in  the  hashtag  user  layer.  However, 
the  proposed  multi-layer  algorithm  was  able  to  pick  out  some 
days  that  were  more  highly  correlated  than  in  either  of  the 


5.  RELATED  WORK 

With  the  advent  of  large  data,  there  has  been  more  opportunity 
to  explore  this  multi-layer  structure.  There  has  been  some 
work  in  the  modeling  and  representation  of  multi-layer  net¬ 
works,  and  how  it  relates  to  other  studied  problems  ifTsl  [3l. 
While  there  is  a  large  body  of  work  in  single-layer  commu¬ 
nity  detection  a,  the  multi-layer  community  detection  liter¬ 
ature  is  less  comprehensive.  Hypergraphs  have  been  studied 
from  a  spectral  perspective  IIT4l.  which  can  be  useful  when 
dealing  with  a  multi-layer  structure.  Some  work  in  applying 
single-layer  modularity  methods  to  multi-layer  structures  is 
also  available  ns.  For  more  information,  see  a.  This  tech¬ 
nique  was  also  used  in  ESI. 

Multi-objective  optimization  has  a  long  history  a.  Here, 
we  are  only  interested  in  a  sorting  algorithm  used  to  find 
points  that  are  possibly  Pareto  optimal;  this  is  called  non- 
dominated  sorting.  The  method  used  in  this  paper  is  part 
of  the  evolutionary  algorithm  described  in  Q.  Some  inter¬ 
esting  application  work  has  been  done  using  multi-objective 
optimization  ED,  including  supervised  and  unsupervised 
learning. 


6.  CONCLUSION 

Multi-level  network  analysis  is  of  growing  interest  as  we  are 
faced  with  increasingly  complex  data.  In  this  paper,  a  method 


was  introduced  for  finding  communities  in  a  multi-layer  struc¬ 
ture;  it  was  demonstrated  on  a  Twitter  hashtag  dataset  and 
shown  to  deliver  results  that  significantly  differ  from  single 
layer  analysis  alone.  The  framework  described  can  also  be 
applied  to  other  single-layer  algorithms  for  the  multi-layer  set¬ 
ting. 
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